Search CORE

15 research outputs found

HEVC-videokoodekin intra-ennustuksen toteutus FPGA-piireille C-kielestä syntesoimalla

Author: Sjövall Panu
Publication venue
Publication date: 09/12/2015
Field of study

High Efficiency Video Coding (HEVC) is the latest video coding standard in video compression. With HEVC, it is possible to compress the video with half the bitrate compared to the previous video coding standard, Advanced Video Coding (AVC), with the same video quality. Now even, the complexity of the encoder is significantly larger. As designs become more and more complex, traditional hardware (HW) description languages (HDLs), such as Very High Speed Integrated Circuit Hardware Description Language (VHDL) or Verilog, can not be used to present the designs without increasing effort. The solution for this is a higher abstraction language for describing HW. High-Level Synthesis (HLS) is a way of using a programming language like C or C++ to describe the HW and automatically generating the HDL from it. This makes the code easier to understand and decreases the time used for implementing the design. This Thesis uses Catapult-C to create an HLS-based implementation of HEVC intra prediction for a Field Programmable Gate Array (FPGA). The HEVC encoder used in this Thesis is open source Kvazaar which has been developed at Tampere University of Technology. The objective is to implement an intra prediction accelerator faster than implementing it with register-transfer level (RTL) using VHDL or Verilog and still get comparable area and performance. This Thesis presents six development versions of the intra prediction accelerator. The complexity of the accelerator grows gradually, as more features were added to it. The final version is able to perform the intra prediction, mode cost computation and mode decision for Full HD video at 24.5 fps using 11 662 adaptive logic modules (ALMs) on an Altera Cyclone V FPGA. This Thesis presents the benefits of Catapult-C and HLS. The implementation results were comparable to hand coded RTL but achieved with a fraction of the estimated time for a VHDL implementation. As a rough estimate, if something takes a month to implement in VHDL, it takes a week with HLS. The biggest gain with HLS is the fast process of changes. Only the C implementation needs to change. The testbench and the RTL-code are generated automatically

Trepo - Institutional Repository of Tampere University

Feasibility Study of High-Level Synthesis : Implementation of a Real-Time HEVC Intra Encoder on FPGA

Author: Sjövall Panu
Publication venue: Tampere University
Publication date: 17/03/2023
Field of study

High-Level Synthesis (HLS) on automatisoitu suunnitteluprosessi, joka pyrkii parantamaan tuottavuutta perinteisiin suunnittelumenetelmiin verrattuna, nostamalla suunnittelun abstraktiota rekisterisiirtotasolta (RTL) käyttäytymistasolle. Erilaisia kaupallisia HLS-työkaluja on ollut markkinoilla aina 1990-luvulta lähtien, mutta vasta äskettäin ne ovat alkaneet saada hyväksyntää teollisuudessa sekä akateemisessa maailmassa. Hidas käyttöönottoaste on johtunut pääasiassa huonommasta tulosten laadusta (QoR) kuin mitä on ollut mahdollista tavanomaisilla laitteistokuvauskielillä (HDL). Uusimmat HLS-työkalusukupolvet ovat kuitenkin kaventaneet QoR-aukkoa huomattavasti. Tämä väitöskirja tutkii HLS:n soveltuvuutta videokoodekkien kehittämiseen. Se esittelee useita HLS-toteutuksia High Eﬃciency Video Coding (HEVC) -koodaukselle, joka on keskeinen mahdollistava tekniikka lukuisille nykyaikaisille mediasovelluksille. HEVC kaksinkertaistaa koodaustehokkuuden edeltäjäänsä Advanced Video Coding (AVC) -standardiin verrattuna, saavuttaen silti saman subjektiivisen visuaalisen laadun. Tämä tyypillisesti saavutetaan huomattavalla laskennallisella lisäkustannuksella. Siksi reaaliaikainen HEVC vaatii automatisoituja suunnittelumenetelmiä, joita voidaan käyttää rautatoteutus- (HW ) ja varmennustyön minimoimiseen. Tässä väitöskirjassa ehdotetaan HLS:n käyttöä koko enkooderin suunnitteluprosessissa. Dataintensiivisistä koodaustyökaluista, kuten intra-ennustus ja diskreetit muunnokset, myös enemmän kontrollia vaativiin kokonaisuuksiin, kuten entropiakoodaukseen. Avoimen lähdekoodin Kvazaar HEVC -enkooderin C-lähdekoodia hyödynnetään tässä työssä referenssinä HLS-suunnittelulle sekä toteutuksen varmentamisessa. Suorituskykytulokset saadaan ja raportoidaan ohjelmoitavalla porttimatriisilla (FPGA). Tämän väitöskirjan tärkein tuotos on HEVC intra enkooderin prototyyppi. Prototyyppi koostuu Nokia AirFrame Cloud Server palvelimesta, varustettuna kahdella 2.4 GHz:n 14-ytiminen Intel Xeon prosessorilla, sekä kahdesta Intel Arria 10 GX FPGA kiihdytinkortista, jotka voidaan kytkeä serveriin käyttäen joko peripheral component interconnect express (PCIe) liitäntää tai 40 gigabitin Ethernettiä. Prototyyppijärjestelmä saavuttaa reaaliaikaisen 4K enkoodausnopeuden, jopa 120 kuvaa sekunnissa. Lisäksi järjestelmän suorituskykyä on helppo skaalata paremmaksi lisäämällä järjestelmään käytännössä minkä tahansa määrän verkkoon kytkettäviä FPGA-kortteja. Monimutkaisen HEVC:n tehokas mallinnus ja sen monipuolisten ominaisuuksien mukauttaminen reaaliaikaiselle HW HEVC enkooderille ei ole triviaali tehtävä, koska HW-toteutukset ovat perinteisesti erittäin aikaa vieviä. Tämä väitöskirja osoittaa, että HLS:n avulla pystytään nopeuttamaan kehitysaikaa, tarjoamaan ennen näkemätöntä suunnittelun skaalautuvuutta, ja silti osoittamaan kilpailukykyisiä QoR-arvoja ja absoluuttista suorituskykyä verrattuna olemassa oleviin toteutuksiin.High-Level Synthesis (HLS) is an automated design process that seeks to improve productivity over traditional design methods by increasing design abstraction from register transfer level (RTL) to behavioural level. Various commercial HLS tools have been available on the market since the 1990s, but only recently they have started to gain adoption across industry and academia. The slow adoption rate has mainly stemmed from lower quality of results (QoR) than obtained with conventional hardware description languages (HDLs). However, the latest HLS tool generations have substantially narrowed the QoR gap. This thesis studies the feasibility of HLS in video codec development. It introduces several HLS implementations for High Eﬃciency Video Coding (HEVC) , that is the key enabling technology for numerous modern media applications. HEVC doubles the coding eﬃciency over its predecessor Advanced Video Coding (AVC) standard for the same subjective visual quality, but typically at the cost of considerably higher computational complexity. Therefore, real-time HEVC calls for automated design methodologies that can be used to minimize the HW implementation and veriﬁcation eﬀort. This thesis proposes to use HLS throughout the whole encoder design process. From data-intensive coding tools, like intra prediction and discrete transforms, to more control-oriented tools, such as entropy coding. The C source code of the open-source Kvazaar HEVC encoder serves as a design entry point for the HLS ﬂow, and it is also utilized in design veriﬁcation. The performance results are gathered with and reported for ﬁeld programmable gate array (FPGA) . The main contribution of this thesis is an HEVC intra encoder prototype that is built on a Nokia AirFrame Cloud Server equipped with 2.4 GHz dual 14-core Intel Xeon processors and two Intel Arria 10 GX FPGA Development Kits, that can be connected to the server via peripheral component interconnect express (PCIe) generation 3 or 40 Gigabit Ethernet. The proof-of-concept system achieves real-time. 4K coding speed up to 120 fps, which can be further scaled up by adding practically any number of network-connected FPGA cards. Overcoming the complexity of HEVC and customizing its rich features for a real-time HEVC encoder implementation on hardware is not a trivial task, as hardware development has traditionally turned out to be very time-consuming. This thesis shows that HLS is able to boost the development time, provide previously unseen design scalability, and still result in competitive performance and QoR over state-of-the-art hardware implementations

Trepo - Institutional Repository of Tampere University

HEVC-videokoodekin intra-ennustuksen toteutus FPGA-piireille C-kielestä syntesoimalla

Author: Sjövall Panu
Publication venue
Publication date: 09/12/2015
Field of study

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University

TUT DPub

High-level Synthesis Implementation of an Accurate HEVC Interpolation Filter on an FPGA

Author: Lemmetti Ari
Rasinen Matti
Sjövall Panu
Vanne Jarno
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2021
Field of study

This paper presents the first known high-level synthesis (HLS) implementation of an accurate interpolation filter for High Efficiency Video Coding (HEVC). The proposed multiplierless shift-register-based architecture is able to interpolate effectively up to four 8×8 blocks at a time for HEVC fractional motion estimation (FME). Our filter is implemented on Intel Arria V and Xilinx Virtex 6 FPGAs. On Arria V, it can operate at 270 MHz with 21.1 kALUTs. According to our profiling results, it can filter an adequate number of samples for FME in real-time 4K HEVC encoding of up to 85 frames per second (fps). On Virtex 6, the respective values are 313 MHz, 27.1 kLUTs, and 99 fps. The proposed solution doubles the speed over any of the existing interpolation filters for HEVC FME on an FPGA. It is also the only interpolation filter that meets the needs of real-time 4K HEVC encoder in practice and without any compromises in 23-bit filtering accuracy.acceptedVersionPeer reviewe

Trepo - Institutional Repository of Tampere University

High-level synthesis implementation of transform-exempted SATD architectures for low-power video coding

Author: Lemmetti Ari
Partanen Tero
Sjövall Panu
Vanne Jarno
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

This paper presents the first known high-level synthesis (HLS) implementation for the Sum of Absolute Transformed Differences (SATD) calculation. The proposed hardware architecture is designed for two SATD algorithms: a widespread Fast Walsh-Hadamard Transform (FWHT-SATD) and a recently introduced Transform Exempted scheme (TE-SATD). This 2-stage architecture is made up of two 1-D Walsh-Hadamard Transform (WHT) stages and a transpose buffer (TB) between them. The chosen HLS approach cuts down design time over contemporary design methods and thereby made it feasible to implement a set of dedicated FWHT-SATD and TE-SATD architectures for 4×4, 8×8, and 16×16 pixel blocks. All these six architectures were synthesized for 28 nm and 45 nm standard cell technologies, and their area and energy consumptions were analysed. TE-based implementations provide 6.0-8.3% total cell area savings and 6.9-12.7% better energy-efficiency than traditional FWHT approaches. Our proposal is the first to introduce TE-SATD architectures for up to 16×16 blocks and each of these tailored architectures was shown to provide better trade-off between silicon area and performance than their reference implementations.acceptedVersionPeer reviewe

Trepo - Institutional Repository of Tampere University

Are We There Yet? A Study on the State of High-level Synthesis

Author: Hämäläinen Timo D.
Lahti Sakari
Sjövall Panu
Vanne Jarno
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2019
Field of study

To increase productivity in designing digital hardware components, high-level synthesis (HLS) is seen as the next step in raising the design abstraction level. However, the quality of results (QoR) of HLS tools has tended to be behind those of manual register-transfer level (RTL) flows. In this paper, we survey the scientific literature published since 2010 about the QoR and productivity differences between the HLS and RTL design flows. Altogether, our survey spans 46 papers and 118 associated applications. Our results show that on average, the QoR of RTL flow is still better than that of the state-of-the-art HLS tools. However, the average development time with HLS tools is only a third of that of the RTL flow, and a designer obtains over four times as high productivity with HLS. Based on our findings, we also present a model case study to sum up the best practices in comparative studies between HLS and RTL. The outcome of our case study is also in line with the survey results, as using an HLS tool is seen to increase the productivity by a factor of six. In addition, to help close the QoR gap, we present a survey of literature focused on improving HLS. Our results let us conclude that HLS is currently a viable option for fast prototyping and for designs with short time to market.acceptedVersionPeer reviewe

Trepo - Institutional Repository of Tampere University

High-level synthesis implementation of HEVC 2-D DCT/DST on FPGA

Author: Hämäläinen Timo D.
Sjövall Panu
Vanne Jarno
Viitamäki Vili
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

This paper presents the first known high-level synthesis (HLS) implementation of integer discrete cosine transform (DCT) and discrete sine transform (DST) for High Efficiency Video Coding (HEVC). The proposed approach implements these 2-D transforms by two successive 1-D transforms using a well-known row-column and Even-Odd decomposition techniques. Altogether, the proposed architecture is composed of a 4-point DCT/DST unit for the smallest transform blocks (TBs), an 8/16/32-point DCT unit for the other TBs, and a transpose memory for intermediate results. On Arria II FPGA, the low-cost variant of the proposed architecture is able to support encoding of 1080p format at 60 fps and at the cost of 10.0 kALUTs and 216 DSP blocks. The respective figures for the proposed high-speed variant are 2160p at 30 fps with 13.9 kALUTs and 344 DSP blocks. These cost-performance characteristics outperform respective non-HLS approaches on FPGA.acceptedVersionPeer reviewe

Crossref

Trepo - Institutional Repository of Tampere University

High-level synthesized 2-D IDCT/IDST implementation for HEVC codecs on FPGA

Author: Hämäläinen Timo D.
Sjövall Panu
Vanne Jarno
Viitamäki Vili
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

This paper presents efficient inverse discrete cosine transform (IDCT) and inverse discrete sine transform (IDST) implementations for High Efficiency Video Coding (HEVC). The proposal makes use of high-level synthesis (HLS) to implement a complete HEVC 2-D IDCT/IDST architecture directly from the C code of a well-known Even-Odd decomposition algorithm. The final architecture includes a 4-point IDCT/IDST unit for the smallest transform blocks (TB), an 8/16/32-point IDCT unit for the other TBs, and a transpose memory for intermediate results. On Arria II FPGA, it supports real-time (60 fps) HEVC decoding of up to 2160p format with 12.4 kALUTs and 344 DSP blocks. Compared with the other existing HLS approach, the proposed solution is almost 5 times faster and is able to utilize available FPGA resources better.acceptedVersionPeer reviewe

Trepo - Institutional Repository of Tampere University

Visualization of Dynamic Resource Allocation for HEVC Encoding in FPGA-Accelerated SDN Cloud

Author: Hämäläinen Timo
Oinonen Arto
Sjövall Panu
Teuho Mikko
Vanne Jarno
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

This paper describes a demonstration setup to visualize dynamic resource allocation for real-time HEVC encoding services in FPGA-accelerated cloud. The demonstrated application is Kvazaar HEVC intra encoder, whose functionality is partitioned between FPGAs and processors. During the demonstration, several encoding services can be invoked with requests to the resource manager, which is responsible for allocation, deallocation, and load balancing of resources in the network. The manager provides JSON data to the visualizer, which uses D3 JavaScript library to visualize 1) the physical network structure; 2) running services; and 3) performance of the network elements. This interactive demonstration allows users to request new video streams, view the encoded streams, observe the visualization of the network and services, and manually turn on/off resources to test the robustness of the system.acceptedVersionPeer reviewe

Crossref

Trepo - Institutional Repository of Tampere University

FPGA-Powered 4K120p HEVC Intra Encoder

Author: Hämäläinen Timo
Kulmala Ari
Sjövall Panu
Vanne Jarno
Viitamäki Vili
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

This paper presents a hardware-accelerated Kvazaar HEVC intra encoder for 4K real-time video coding at up to 120 fps. The encoder is implemented on a Nokia AirFrame Cloud Server featuring a 2.4 GHz dual 14-core Intel Xeon processor and two Arria 10 PCI Express FPGA accelerator cards. The presented encoder is a speed-optimized version of our 1st generation 4K40p HEVC intra encoder. The proposed speedup techniques include 1) Increasing the number of FPGA cards to two; 2) Remapping the simplest multiplications from DSP blocks to logic for better FPGA utilization; 3) Making task scheduling more flexible to improve utilization rate of hardware accelerators; and 4) Increasing the pipeline depth and duplicating time-sensitive resources in the hardware accelerator. As a result, up to three hardware accelerator instances can be accommodated in a single Arria 10 so the encoder is able to make use of six accelerators. According to our experiments, the proposed encoder obtains threefold speedup over our 1st generation encoder. Our proposal is also shown to outperform all other encountered FPGA and ASIC implementations.acceptedVersionPeer reviewe

Crossref

Trepo - Institutional Repository of Tampere University